Challenge Problem for Computational Auditory Scene Analysis: Understanding Three Simultaneous Speeches
نویسندگان
چکیده
Understanding three simultaneous speeches is proposed as a challenge problem to foster arti cial intelligence, speech and sound understanding or recognition, and computational auditory scene analysis research. Automatic speech recognition under noisy environments is attacked by speech enhancement techniques such as noise reduction and speaker adaptation. However, the signal-to-noise ratio of speech in two simultaneous speeches is too poor to apply these techniques. Therefore, novel techniques need to be developed. One candidate is to use speech stream segregation as a front-end of automatic speech recognition systems. Preliminary experiments on understanding two simultaneous speeches with or without an additional interfering sound show that the proposed challenge problem will be feasible with speech stream segregation. The detailed plan of the research on and benchmark sounds for the proposed challenge problem is also presented.
منابع مشابه
Understanding Three Simultaneous Speeches
Understanding three simultaneous speeches is proposed as a challenge problem to foster artificial intelligence, speech and sound understanding or recognition, and computational auditory scene analysis research. Automatic speech recognition under noisy environments is attacked by speech enhancement techniques such as noise reduction and speaker adaptation. However, the signal-to-noise ratio of s...
متن کاملCombining Independent Component Analysis and Sound Stream Segregation
This paper reports the issues and results of AI Challenge: \Understanding Three Simultaneous Speeches". First, the issues of the Challenge are revisited. We emphasis the importance of information fusion of various attributes of speeches (sounds) in separating speeches from a mixture of sounds. This emphasis is supported by comparing two methods of speech separation; computational auditory scene...
متن کاملEffects of increasing modalities in understanding three simultaneous speeches with two microphones
This paper reports effects of increasing modalities in understanding three simultaneous speeches with two microphones. This problem is difficult because the beamforming technique adopted for a microphone array needs at least four microphones, and because independent component analysis adopted for blind source separation needs at least three microphones. We investigate four cases; monaural (one ...
متن کاملComputational Scene Analysis
A remarkable achievement of the perceptual system is its scene analysis capability, which involves two basic perceptual processes: the segmentation of a scene into a set of coherent patterns (objects) and the recognition of memorized ones. Although the perceptual system performs scene analysis with apparent ease, computational scene analysis remains a tremendous challenge as foreseen by Frank R...
متن کاملSeparating three simultaneous speeches with two microphones by integrating auditory and visual processing
This paper addresses the problem of automatic recognition of three simultaneous speeches with two microphones, that is, that of sound source separation where the number of sound sources is greater than that of microphones. The approach used is the direction-pass filter, which is implemented by hypothetical reasoning on the interaural phase difference (IPD) and interaural intensity difference (I...
متن کامل